Skip to content

Conversation

@jiribenes
Copy link
Contributor

@jiribenes jiribenes commented Dec 5, 2025

Resolves #462

The parser soft fails (fails with recovery) if it encounters a binary operator that is not surrounded by some whitespace:

// OK
val _ = 1 + 2
val _ = 1  + 2
val _ = 1 +  2
val _ = 1 +
  2

// FAIL
val _ = 1 +2
val _ = 1+ 2
val _ = 1+2

The implementation is a bit hacky, see the note in the comment.

@jiribenes jiribenes added experiment Experimental branch, do not merge! area:parser/lexer labels Dec 5, 2025
@jiribenes

This comment was marked as resolved.

@jiribenes jiribenes removed the experiment Experimental branch, do not merge! label Dec 9, 2025
Comment on lines -144 to +157
/**
* Negative lookahead
*/
def lookbehind(offset: Int): Token =
tokens(position - offset)
def sawNewlineLast: Boolean = {
@tailrec
def go(position: Int): Boolean =
if position < 0 then fail("Unexpected start of file")

tokens(position).failOnErrorToken(position) match {
case token if isSpace(token.kind) && token.kind != Newline => go(position - 1)
case token if token.kind == Newline => true
case _ => false
}

go(position - 1)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we did lookbehind(1) == Newline, but that no longer works, the lookbehind would need to ignore all whitespace tokens except for a newline... I found that to be too weird, so I replaced it with a specialised function to just check if the last non-space token was a newline (feel free to suggest a better name)

Comment on lines 1066 to 1075
// Check that the current token is surrounded by whitespace. If not, soft fail.
private def checkBinaryOpWhitespace(): Unit = {
// position points to the operator token in the raw token array
val wsBefore = position > 0 && isSpace(tokens(position - 1).kind)
val wsAfter = position + 1 < tokens.length && isSpace(tokens(position + 1).kind)

if (!wsBefore || !wsAfter) {
softFail(s"Missing whitespace around binary operator", position, position)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm very happy about the implementation :)

@jiribenes jiribenes requested review from b-studios and dvdvgt and removed request for dvdvgt December 9, 2025 21:13
@jiribenes jiribenes marked this pull request as ready for review December 9, 2025 21:13
@jiribenes jiribenes requested a review from dvdvgt December 9, 2025 21:17
Copy link
Collaborator

@dvdvgt dvdvgt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overally, this looks good to me. I am in favor deferring whitespace checks around the operators to the parser instead of the lexer. Just some minor comments below.

else
Token(tokenStartPosition.offset, position.offset - 1, kind)

private def skipWhitespace(): Unit =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called skipNewline instead maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it outright, it's not needed anymore, I think.

Comment on lines 388 to 391
case (' ', _) => advanceSpaces()
case ('\t', _) => advanceSpaces()
case ('\n', _) => advanceWith(TokenKind.Newline)
case ('\r', '\n') => advance2With(TokenKind.Newline)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused now when newlines and spaces are skipped and when they are emitted. next calls skipWhitespace but also not always. Perhaps a brief comment on whitespace handling would be nice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I redid this part, whitespace ought to be always emitted. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, seems reasonable

Copy link
Member

@marvinborner marvinborner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are people (and formatters) that like to omit the spaces to make the precedence of operators more clear to the reader (see PEP8 "use your own judgement" 😉). Especially with more complex precedence rules (like for parsing or chaining (e.g. arrow for computation UFCS)), we might want to relax the enforcing of whitespace.

Just want to leave this comment here, I'm fine with this for now though :)

@jiribenes
Copy link
Contributor Author

I have changed it to only emit a warning :)

@jiribenes
Copy link
Contributor Author

cc @marvinborner: do your feelings change now that it's merely a warning, and one that doesn't crash, allowing you to even run a program with parser warnings? :) [see tests]

Comment on lines +1071 to +1079
// Check that the current token is surrounded by whitespace. If not, soft fail.
private def checkBinaryOpWhitespace(): Unit = {
val wsBefore = position > 0 && isSpace(tokens(position - 1).kind)
val wsAfter = position + 1 < tokens.length && isSpace(tokens(position + 1).kind)

if (!wsBefore || !wsAfter) {
warn(s"Missing whitespace around binary operator", position, position)
}
}
Copy link
Contributor Author

@jiribenes jiribenes Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could ever-so-slightly generalise this:

  // Require that the current token is surrounded by whitespace. If not, soft fail.
  private def requireSpacesAround(): Unit = {
    val wsBefore = position     > 0             && isSpace(tokens(position - 1).kind)
    val wsAfter  = position + 1 < tokens.length && isSpace(tokens(position + 1).kind)

    if (!wsBefore || !wsAfter) {
      warn(s"Missing whitespace around '${TokenKind.explain(peek.kind)}'", position, position)
    }
  }

and then use it for other places where we really want spaces on both sides (like the = in val foo = ... / def foo = ... / ...) :)

@marvinborner
Copy link
Member

Fine by me, sure. In the future we could maybe add flags to hide warnings, including such parser warnings, if desired.

@jiribenes jiribenes changed the title Enforce whitespace around binary operators Improve whitespace handling in Lexer & Parser, warn on bad whitespace around binops Dec 10, 2025
@jiribenes jiribenes merged commit e4f8222 into master Dec 10, 2025
8 checks passed
@jiribenes jiribenes deleted the parser/force-binop-spaces branch December 10, 2025 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

x!=y interpreted as x! = y due to ! being allowed in identifiers

4 participants